A "stereo" document representation for textual information retrieval

نویسندگان

Liang Chen

Jia Zeng

Naoyuki Tokuda

چکیده

Encouraged by a significant improvement over LSI (latent semantic indexing) approach in textual information retrieval of the DLSI (differential latent semantic indexing) approach which technically makes use of two term vectors for each document, we have proposed a concept of stereo, or multiperspective, document representation, which is expected to be effective for most of textual information retrieval approaches based on vector space model. We show that the new representation based on two or more “pictures” of each document taken from different view angles contributes to the enhanced performance of textual document retrieval by enhanced capability of extracting and capturing more individualistic features of the document. A Student t-test on experimental results on the standard Time and ADI corpora proves that the improvements of the retrieval performances of LSI/standard term vector algorithms based on multi-perspective document representation over those based on traditional single document representation are significant. ∗Corresponding Author

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Title Patterns in Information Retrieval

The document titles give an important information about documents. This is why they are frequently used to obtain document keywords. We use them to determine document intentions. To obtain some textual details, we use special information extraction techniques for the construction of extra-topical representations of the documents. This representation reflects a document more completely. A possib...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

متن کامل

Effectiveness of additional representations for the search result presentation on the web

The presentation of search results on the web has been dominated by the textual form of document representation. On the other hand, the document’s visual aspects such as the layout, colour scheme, or presence of images have been studied in a limited context with regard to their effectiveness of search result presentation. This article presents a comparative evaluation of textual and visual form...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

JASIST

دوره 57 شماره

صفحات -

تاریخ انتشار 2006

A "stereo" document representation for textual information retrieval

نویسندگان

چکیده

منابع مشابه

Document Title Patterns in Information Retrieval

A New Document Embedding Method for News Classification

Improved Skips for Faster Postings List Intersection

Improved Skips for Faster Postings List Intersection

Effectiveness of additional representations for the search result presentation on the web

عنوان ژورنال:

اشتراک گذاری